AITopics | bayesian attention module

Collaborating Authors

bayesian attention module

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Bayesian Attention Modules

Neural Information Processing SystemsDec-24-2025, 13:05:18 GMT

Attention modules, as simple and effective tools, have not only enabled deep neural networks to achieve state-of-the-art results in many domains, but also enhanced their interpretability. Most current models use deterministic attention modules due to their simplicity and ease of optimization. Stochastic counterparts, on the other hand, are less popular despite their potential benefits. The main reason is that stochastic attention often introduces optimization issues or requires significant model changes. In this paper, we propose a scalable stochastic version of attention that is easy to implement and optimize. We construct simplex-constrained attention distributions by normalizing reparameterizable distributions, making the training process differentiable. We learn their parameters in a Bayesian framework where a data-dependent prior is introduced for regularization. We apply the proposed stochastic attention modules to various attention-based models, with applications to graph node classification, visual question answering, image captioning, machine translation, and language understanding. Our experiments show the proposed method brings consistent improvements over the corresponding baselines.

bayesian attention module, electronic proceedings, name change, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.61)

Add feedback

Bayesian Attention Modules: Appendix A Algorithm Algorithm 1: Bayesian Attention Modules

Neural Information Processing SystemsAug-16-2025, 03:39:36 GMT

We follow the same architectural hyperparameters as in V eli ˇ ckovi c et al. We adopt hypothesis testing to quantify the uncertainty of a model's prediction. Acc (ans) = min{ (#human that said ans)/ 3, 1}. By stacking MCA layers, MCAN enables deep interactions between the question and image features. We conduct experiments on an attention-based model for image captioning, Att2in, in Rennie et al.

bayesian attention module, experiment, image feature, (14 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

Review for NeurIPS paper: Bayesian Attention Modules

Neural Information Processing SystemsFeb-5-2025, 07:28:42 GMT

The improvement compared to deterministic attention seems marginal on some tasks such as VQA and machine translation. Unlike prior works on discrete latent variables that can make interpretability claims, I'm not sure why we want to model soft attention as a latent variable. Are they better under low resource scenarios? Also, can you do more quantification of the benefits of modeling attention uncertainties? Or even qualitatively showing a few examples of samples from the attention distribution and see if they truly reflect the underlying uncertainties.

approximate posterior, bayesian attention module, neurips paper, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.39)

Add feedback

Review for NeurIPS paper: Bayesian Attention Modules

Neural Information Processing SystemsFeb-5-2025, 07:28:35 GMT

This paper proposes considering attention mechanisms as continuous latent variables, using VAEs for training. It uses reparametrizable distributions such as Weibull and log-normal distributions to get unnormalized weights, which are then normalized. Experiments show that the proposed continuous latent attention mechanism gets better performance compared to deterministic attention on a wide variety of tasks, including image captioning, machine translation, graph classification, and fine-tuning BERT. All reviewers recommended acceptance, pointing out that this is an interesting idea and a solid and well-executed work. One concern was raised about the significance of improvement on VQA and NMT, and about directly setting prior to approximate posterior, which the authors addressed in the rebuttal.

bayesian attention module, latent variable model, neurips paper, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.93)

Add feedback

Bayesian Attention Modules

Neural Information Processing SystemsOct-11-2024, 05:54:42 GMT

bayesian attention module

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.64)

Add feedback